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DETAILED ACTION 

This office action is responsive to communications filed on December 21 , 2007. 
Claim 16 has been cancelled, Claims 1,19, 22, 24 and 26 have been amended and 
Claims 1-15 and 17-28 are pending in the application. 

Claim Objections 

1 . Claim 19 is objected to because of the following informalities: Claim 19 recites 
the limitation "wherein the chunk maps stored in a chunk table are employed to 
determine the first chunk" in line 3. There is insufficient antecedent basis for the "chunk 
maps" mentioned in this limitation. Appropriate correction is required. 

Claim Rejections - 35 USC §112 

2. The following is a quotation of the first paragraph of 35 U.S. C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

3. Claim 19 is rejected under 35 U.S.C. 112, first paragraph, as failing to comply 
with the written description requirement. Claim 19, as currently amended recites the 
new limitation "wherein the chunk maps stored in a chunk table are employed to 
determine the first chunk". The specification does not provide the necessary 
antecedent basis for this limitation in the claim. The specification does not disclose 
chunk maps being stored in a chunk table. 
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Claim Rejections - 35 USC § 102 

4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1 ) an application for patent, published under section 1 22(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351 (a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

5. Claims 24-25 are rejected under 35 U.S.C. 102(e) as being anticipated by Evans 
et al.(US 2004/0030683). 

Regarding Claim 24, Evans teaches a data packet transmitted between two or 
more computer components that facilitates document re-crawl, the data packet 
comprising: 

a chunk header that includes metadata associated with the data packet (Fig. 3 
shows a site map (chunk) having a header 324 that includes metadata; "Metadata may 
also be obtained from the media/metadata transport stream, such as TCP/IP (e.g., 
packets)" - See [0017]); 

an offset section that provides offset information associated with document files 
(Document files are offset into separate "levels" as illustrated in Fig. 3; "the number of 
levels in a site map is heuristically determined. For example, a specific web site may be 
exhaustively searched upon first being encountered and it is determined that six levels 
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comprise information related to streaming media and/or multimedia (i.e., the target 
content)"- See [0026]); and, 

the document files that include content found on the Internet (The site map 
(chunk) shown in Fig. 3 comprises multiple document files which are content found on 
the Internet, such as web pages, music objects, video objects, etc.), 

wherein the average of the at least one of the properties of all the document files 
determines if the document should be re-crawled {"the system 100 stores auxiliary 
information pertaining to the encountered web sites in database 400"- See [0032]; "The 
system uses the auxiliary information to determine how often to conduct a recrawl. How 
often and when a recrawl is to be conducted may be determined statistically, 
heuristically, and/or by user input"- See [0033]). 

Regarding Claim 25, Evans teaches the document files comprising at least one 
of an HTML file, a PDF file, a PS file, a PPT file, an XLS file and a DOC file {"Web page 
content includes HTML"- See [001 7]). 

Claim Rejections - 35 USC § 103 

6. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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7. Claims 1-4, 6-15, 17, 19-23 and 26-28 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Najork et al. (US 6,263,364) in view of Evans et al. (US 
2004/0030683). 

Regarding Claim 1 , Najork teaches an indexer that places items with similar 
properties into respective chunks wherein the properties are at least one of average 
time between change or average importance of documents comprising a particular 
chunk ("the document is assigned to a priority level subqueue based on a predefined 
set of criteria 282 are satisfied, including but not limited to: ...the document's rate of 
change, based on (a) its modification date and time" -See Col. 1 1 , lines 48-55; The 
documents (items) with similar importance (based on their rate of change) are placed in 
separate (queues) chunks with a corresponding priority level). 

Although Najork teaches properties being associated with their respective chunk, 
Najork does not teach storing these properties in a chunk map. 

Evans teaches a chunk map that stores properties associated with chunks {"the 
system 100 stores auxiliary information pertaining to the encountered web sites in 
database 400"- See [0032]) wherein the chunk map is employed to facilitate an 
incremental web re-crawl ("The system uses the auxiliary information to determine how 
often to conduct a recrawl. How often and when a recrawl is to be conducted may be 
determined statistically, heuristically, and/or by user input" - See [0033]). 

The chunk map taught by Evans is used to facilitate an incremental re-crawl. 
Evans does not teach that the properties stored in the chunk map are time between 
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change and/or importance of items comprising a particular chunk. However, Najork 
teaches that associating items with similar time between change and importance in a 
particular subqueue having an associated priority level is useful for facilitating an 
incremental re-crawl ("documents that change more frequently should be assigned to a 
higher priority level subqueue, on the basis that pages that exhibit changes are likely to 
change again in the near future" -See Col. 1 1 , lines 59-62; "a continuous web crawler 
having priority level subqueues that are used to maintain the freshness of document 
indices and other document based information databases"- See Col. 1 2, lines 22-25). 
Thus, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to include properties regarding time between change and/or 
importance of items in each respective chunk within the "auxiliary data" taught by Evans 
in order to facilitate a re-crawl. 

Regarding Claim 2, Najork teaches the items comprising information associated 
with a URL ("Queue elements for URL's to be downloaded are assigned a priority level, 
and then stored in the corresponding priority queue"- See Col. 3, line 67 & Col. 4, lines 
1-2). 

Regarding Claim 3, Najork teaches the items comprising at least one of an HTML 
file, a PDF file, a PS file, a PPT file, an XLS file and a DOC file ("Each web page on the 
world wide web has a distinct address called its uniform resource locator (URL), which 
identifies the location of the web page. Most of the documents on the world wide web 
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are written in standard document description languages (e.g., HTML, XML)"- See Col. 
1, lines 20-25). 

Regarding Claim 4, Najork teaches the crawler being responsible for a specific 
set of Uniform Resource Locators (The queue entry pictured in Fig. 3 has a 
corresponding URL 144. Each queue entry corresponds to a document that must be 
crawled or re-crawled. The web crawler 102 is responsible for crawling all the 
documents associated with a respective URL in each of the queues). 

Regarding Claim 6, Najork teaches a master control process that serves as an 
interface between a crawler and a re-crawl controller (Queue Element Procedure 138). 

Regarding Claim 7, Najork teaches the master control process maintaining a 
known chunks table that stores information for components of a system ("a set of 
Queue Element handling procedures 138 for adding and deleting records of information 
to queue elements, and for adding and deleting name/value pairs to those records of 
information"- See Col. 5 lines 66-67 & Col. 6, lines 1-2). 

Regarding Claim 8, Najork teaches the master control process exposing an 
interface for communication with a component of the system (Communications Interface 
104 allows communication with other components of the system such as Web Page 
Indexing System 116). 
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Regarding Claim 9, Najork teaches returning a list of chunks the component 
should have and where to get the chunks ("one processing module might determine 
whether the document has already been included in a web page index"- See Col. 1 , 
lines 58-60). 

Regarding Claim 10, Najork teaches returning a list of the chunks that should be 
actively served by the component ("Another processing module might add or update a 
document's entry in the web page index"- See Col. 1 , lines 62-64). 

Regarding Claim 1 1 , Najork teaches returning a range of chunk identifiers to use 
in building a new chunk by the component ("a host-to-queue assignment table 132 for 
recording dynamic assignments of host identifiers to the queues 128"- See Col. 5, lines 
56-58). 

Regarding Claim 12, Evans teaches causing an old chunk to be retired by the 
system ("the system 100 conducts subsequent extensive searches (referred to as 
recrawl) of previously encountered web sites to update the database 400 (e.g., update a 
web site's respective site map, update the directory of encountered sites 412, delete a 
site map, delete a URL from the directory 412)"- See [0033]; The site map corresponds 
to a chunk of documents, as shown in Fig. 3 and described in paragraph [0025]). 
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Regarding Claim 13, Najork teaches the master control process facilitating 
movement of chunks from one component to another component ("Multiple threads 
substantially concurrently process the document addresses in the queues" - See 
Abstract; "the SelectQueue procedure selects any one of those queues, removes it from 
the ordered set, and passes it to the calling thread" - See Col. 14, lines 52-54). 

Regarding Claim 14, Najork teaches the movement of chunks being based upon 
at least one of rebalancing index servers after one goes down, re-crawling pages 
previously crawled, and restoring a state of a crawler after it has crashed {"The threads 
then process the queue elements in the underlying queues"- See Col. 4, lines 5-6; 
"Many other examples of criteria 282 for assigning a priority level to a document's queue 
element can be devised by one of ordinary skill in the art, depending in large part on 
what information is stored in the document's download history and an assessment of 
which documents are the most important to refresh the most frequently"- See Col. 12, 
lines 1-6). 

Regarding Claim 15, Evans teaches a re-crawl component that employs the 
chunk map to determine which chunks, if any, to re-crawl at a particular time ("the 
system 100 stores auxiliary information pertaining to the encountered web sites in 
database 400"- See [0032]; "The system uses the auxiliary information to determine 
how often to conduct a recrawl. How often and when a recrawl is to be conducted may 
be determined statistically, heuristically, and/or by user input" - See [0033]). 
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Regarding Claim 17, Evans teaches an index chunk that stores information 
associated with an index of at least some of the items ("The directory 412 of 
encountered web sites comprises the URL of each encountered web site for which a 
site map has been created and information pertaining to the content of each web site"- 
See [0027]). 

Regarding Claim 19, Najork teaches parsing a first chunk for uniform resource 
locators, re-crawling the uniform resource locators, and forming a second chunk based 
upon the re-crawled uniform resource locators ("Given a set of URL's, the web crawler 
102 enqueues the URL's into appropriate queues 128. Multiple threads 130 are used to 
dequeue URL's out of the queues 128, to download the corresponding documents or 
web pages from the world wide web and to extract any new URL's from the downloaded 
documents. Any new URL's are enqueued into the queues 128"- See Col. 6, lines 29- 
35; After the crawling is performed, any newly discovered URL's are placed into queues 
(chunks). The queues are modified from their previous state. Thus, at least one new 
chunk is formed based upon the re-crawled URL's). 

Najork does not explicitly teach the chunk maps stored in a chunk table being 
employed to determine the first chunk. Evans teaches chunk maps being employed to 
determine a first chunk (site maps 414, 416, 422 and 424 are stored in database 400; 
"the agent utilizes the directory and the respective site map to search only for relevant 
content (referred to as a focused crawl)" -See [0023]). 
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The chunk map taught by Evans is used to determine a chunk to re-crawl. Evans 
does not teach that the properties stored in the chunk map are time between change 
and/or importance of items comprising a particular chunk. However, Najork teaches 
that associating items with similar time between change and importance in a particular 
subqueue having an associated priority level is useful for facilitating an incremental re- 
crawl ("documents that change more frequently should be assigned to a higher priority 
level subqueue, on the basis that pages that exhibit changes are likely to change again 
in the near future"- See Col. 11 , lines 59-62; "a continuous web crawler having priority 
level subqueues that are used to maintain the freshness of document indices and other 
document based information databases"- See Col. 12, lines 22-25). Thus, it would 
have been obvious to one of ordinary skill in the art at the time the invention was made 
to include properties regarding time between change and/or importance of items in each 
respective chunk within the "auxiliary data" (chunk map) taught by Evans in order to 
determine a chunk to re-crawl. 

Regarding Claim 20, Evans teaches determining whether any chunks are to be 
retired {"the system 100 conducts subsequent extensive searches (referred to as 
recrawl) of previously encountered web sites to update the database 400 (e.g., update a 
web site's respective site map, update the directory of encountered sites 412, delete a 
site map, delete a URL from the directory 412)"- See [0033]; The site map corresponds 
to a chunk of documents, as shown in Fig. 3 and described in paragraph [0025]). 
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Regarding Claim 21, Evans teaches one or more computer readable media 
having stored thereon computer executable instructions for carrying out the method of 
Claim 1 9 ("The present invention may be embodied in the form of computer- 
implemented processes and apparatus for practicing those processes. The present 
invention may also be embodied in the form of computer program code embodied in 
tangible media, such as floppy diskettes, read only memories (ROMs), CD-ROMs, hard 
drives, high density disk, or any other computer-readable storage medium, wherein, 
when the computer program code is loaded into and executed by a computer, the 
computer becomes an apparatus for practicing the invention"- See [0034]). 

Regarding Claims 22 and 23, Evans teaches accessing a chunk map containing 
properties associated with chunks of data as a result of one or more web crawls and 
periodically determining, based on the properties in the chunk map, whether to re-crawl 
one or more of the chunks of data ("the system 100 stores auxiliary information 
pertaining to the encountered web sites in database 400"- See [0032]; "The system 
uses the auxiliary information to determine how often to conduct a recrawl. How often 
and when a recrawl is to be conducted may be determined statistically, heuristically, 
and/or by user input"- See [0033]). 

Evans does not explicitly teach the chunk map containing properties associated 
with respective chunks of data, wherein the properties are at least one of average time 
between change or average importance of documents comprising a particular chunk. 
Najork teaches placing items with similar properties into respective chunks wherein the 
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properties are at least one of average time between change or average importance of 
documents comprising a particular chunk ("the document is assigned to a priority level 
subqueue based on a predefined set of criteria 282 are satisfied, including but not 
limited to: ...the document's rate of change, based on (a) its modification date and time" 
- See Col. 11, lines 48-55; The documents (items) with similar importance (based on 
their rate of change) are placed in separate (queues) chunks with a corresponding 
priority level). 

The chunk map taught by Evans is used to determine whether a re-crawl should 
be performed. Evans does not teach that the properties stored in the chunk map are 
time between change and/or importance of items comprising a particular chunk. 
However, Najork teaches that associating items with similar time between change and 
importance in a particular subqueue having an associated priority level is useful for 
facilitating an incremental re-crawl ("documents that change more frequently should be 
assigned to a higher priority level subqueue, on the basis that pages that exhibit 
changes are likely to change again in the near future"- See Col. 11, lines 59-62; "a 
continuous web crawler having priority level subqueues that are used to maintain the 
freshness of document indices and other document based information databases" - 
See Col. 12, lines 22-25). Thus, it would have been obvious to one of ordinary skill in 
the art at the time the invention was made to include properties regarding time between 
change and/or importance of items in each respective chunk within the "auxiliary data" 
(chunk map) taught by Evans in order to determine whether a re-crawl should be 
performed. 
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Regarding Claim 26, Najork teaches means for placing items with similar 
properties into respective chunks wherein the properties are at least one of average 
time between change or average importance of documents comprising a particular 
chunk ("the document is assigned to a priority level subqueue based on a predefined 
set of criteria 282 are satisfied, including but not limited to: ...the document's rate of 
change, based on (a) its modification date and time" -See Col. 11, lines 48-55; The 
documents (items) with similar importance (based on their rate of change) are placed in 
separate (queues) chunks with a corresponding priority level). 

Although Najork teaches properties being associated with their respective chunk, 
Najork does not teach storing these properties in a chunk map. 

Evans teaches means for storing properties associated with chunks {"the system 
100 stores auxiliary information pertaining to the encountered web sites in database 
400"- See [0032]) wherein the chunk map is employed to facilitate an incremental web 
re-crawl ("The system uses the auxiliary information to determine how often to conduct 
a recrawl. How often and when a recrawl is to be conducted may be determined 
statistically, heuristically, and/or by user input" - See [0033]). 

The means for storing properties associated with chunks taught by Evans is used 
to facilitate an incremental re-crawl. Evans does not teach that the properties stored in 
the chunk map are time between change and/or importance of items comprising a 
particular chunk. However, Najork teaches that associating items with similar time 
between change and importance in a particular subqueue having an associated priority 
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level is useful for facilitating an incremental re-crawl ("documents that change more 
frequently should be assigned to a higher priority level subqueue, on the basis that 
pages that exhibit changes are likely to change again in the near future" - See Col. 11, 
lines 59-62; "a continuous web crawler having priority level subqueues that are used to 
maintain the freshness of document indices and other document based information 
databases" - See Col. 12, lines 22-25). Thus, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to include properties 
regarding time between change and/or importance of items in each respective chunk 
within the "auxiliary data" taught by Evans in order to facilitate a re-crawl. 

Regarding Claim 27, Najork teaches the items comprising information associated 
with a URL ("Queue elements for URL's to be downloaded are assigned a priority level, 
and then stored in the corresponding priority queue"- See Col. 3, line 67 & Col. 4, lines 
1-2). 

Regarding Claim 28, Najork teaches the items comprising at least one of an 
HTML file, a PDF file, a PS file, a PPT file, an XLS file and a DOC file ("Each web page 
on the world wide web has a distinct address called its uniform resource locator (URL), 
which identifies the location of the web page. Most of the documents on the world wide 
web are written in standard document description languages (e.g., HTML, XML)"- See 
Col. 1, lines 20-25). 
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8. Claim 5 is rejected under 35 U.S.C. 103(a) as being unpatentable over Najork et 
al. (US 6,263,364) in view of Evans et al. (US 2004/0030683) and further in view of 
Eichstaedt et al. (US 6,182,085). 

Regarding Claim 5, Najork and Evans do not explicitly teach the system further 
comprising a master control process that can modify the chunk map to facilitate load 
balancing amongst a plurality of crawlers. However, Eichstaedt does teach a master 
control process that can modify the chunk map to facilitate load balancing amongst a 
plurality of crawlers ("A distributed collection of web-crawlers to gather information over 
a large portion of the cyberspace. These crawlers share the overall crawling through a 
cyberspace partition scheme. They also collaborate with each other through load 
balancing to maximally utilize the computing resources of each of the crawlers. The 
invention takes advantage of the hierarchical nature of the cyberspace namespace and 
uses the syntactic components of the URL structure as the main vehicle for dividing and 
assigning crawling workload to individual crawler"- See Abstract). It would have been 
obvious to one of ordinary skill in the art at the time the invention was made to have a 
control process that can facilitate load balancing amongst a plurality of crawlers. One 
would have been motivated to do so in order to maximally utilize the computing 
resources of each of the crawlers. 
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9. Claim 18 is rejected under 35 U.S.C. 
et al. (US 6,263,364) in view of Evans et al. 
Acharaya et al. (US 2007/0094255). 
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103(a) as being unpatentable over Najork 
(US 2004/0030683) and further in view of 



Regarding Claim 18, Najork and Evans do not explicitly teach a ranking chunk 
that stores a static rank associated with an index chunk. However, Acharaya does 
teach a ranking chunk that stores a static rank associated with an index chunk 
("Ranking component 330 may assign a ranking score (also called simply a "score" 
herein) to one or more documents in document corpus 340"- See [0038]). It would 
have been obvious to one of ordinary skill in the art at the time the invention was made 
to store a rank associated with an index chunk. Motivation for doing so would be to 
improve search results generated in connection with a search query. 

Response to Arguments 

10. Applicant's arguments with respect to Claims 1,19, 22, 24 and 26 have been 
considered but are moot in view of the new ground(s) of rejection. 

Conclusion 

1 1 . Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See M PEP 

§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Scott M. Sciacca whose telephone number is (571) 270- 
1 91 9. The examiner can normally be reached on Monday thru Friday, 7:30 A.M. - 5:00 
P.M. EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Jeff Pwu can be reached on (571 ) 272-6798. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
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